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COMPUTER 

This invention relates to computers and in particular 
to a highly parallel computer having a plurality of inter- 
onltld processing units capable of operating -i--".^- 
in particular for recognition tasks it is required to 
5 simulate the operation of a large neural network having for 
example, 1000 by 1000 interconnected neural unit. <»«"°"«>- 
A particular feature of such machines which is required to be 
exploited is their ability to learn to recognise a charac- 
teristic of the input signals by adaptive technique, in 
10 res onse to the error between the output of the machine when 
it receives a particular input and the output which xt xs 
required to produce. Past work has involved the study of a 
large highly interconnected network and more recently xt has 
IZ shown Lt many groups of individual networks which are 
15 locally highly interconnected can operate satisfactorily wxth 
comparatively sparse interconnections between the groups. 
There has been a problem in setting the memory elements m 
such systems to give a desired response character stic, and 
this problem has been met by the introduction of back 
20 propagation" or ""error propagation" which is a technique 
defining a procedure by which a system with an arbitrary 
number of groups of neural units having no direct connection 
with the inputs or outputs of the system can learn from the 
input data and the error between the actual output and the 
25 desired output of the network. - 
The simulation of a neural network falls into two 

phases, solution and learning. 

in the solution phase the input group of numbers or 
vector to a unit (neuron) is multiplied by corresponding 
30 stored elements (synapses) in that unit and the resulting 
products are summed to form a single number. The single 
number is then applied to a non-linear operator the output of 
which forms the output of the unit and is communicated to the 
input of one or more other units. From a network of units 
35 the outputs of the final units form the output of the 
network. 
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The learning phase makes the use of the error propa- 
gation algorithm and in this phase an error vector is 
computed from the difference between the output from the 
network and the desired output and the stored elements in 
5 each unit are updated by the product of the error vector and 
the input vector giving rise to it. 

Both the solution and learning phases involve a large 
amount of calculation and a lot of accesses to the memory 
storing the elements. Neural networks have been simulated on 
10 large scale computers, but it has been found that the time 
involved fora single processing unit to perform all the 
calculations is prohibitively high and for a large network 
could involve as much as several months of calculation. This 
is clearly impractical. 

15 Ifc is an object of the present invention to provide a 

computer specifically for the purpose of simulating neural 
networks 

According to one aspect of the present invention there 
is provided a computer comprising a plurality of separate 

20 data processing units each including a processor, a multi- 
address read-write memory and communication means, wherein 
the data processing units are arranged in a three-dimen- 
sional, possibly rectangular, array- and the communication 
means of each unit is connected to the communication means of 

25 the nearest units only in each direction along the three axes 
of the array.- 

According to a second aspect of the present invention 
there is provided a computer comprising a plurality of 
separate data processing units each including a processor, a 
30 multi-address read-write memory and communication means, the 
communication- means of each unit being connected to the 
communication means of at least two other data processing 
units or to the communication means of at least one other 
data processing unit and to a data input or a data output, 
35 wherein each data processing unit operates as a neuron 

simulator responding to a plurality of data bits as an input 
and producing a plurality of data bits as an output. 
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The processor may include a gate array configured as 
a neural network solution engine capable of multiplying a 
plurality of input numbers by respective coefficients, 
summing the products to produce a single number .„d_ 
performing a non-linear operation on the sxngle number to 

produce an output number. 

The data processing unit - may include a mxcroprocessor 

as a controller for the unit. 

The communication means may be arranged to operate 
0 serially, having a shifting register with, say, 64 stages to 
which the data to be communicated to another unit is 
transferred in parallel. Communication between data 
processing units is effected by connecting the * hl "^ 
registers of the two units end to end in a ring and clocking 
5 the data from each shifting register into the other. The 
Lunication means may have only a single sh '"xn, r.,x ^ 
the end terminals of which are connected selectxvely to paxrs 
of terminals of the data processing unit by multiplexers 
aata processing unit may have six pairs of terminals whxch 
20 are connected to pairs of terminals of six other data 

processing units. Control of the multiplexers, for example 
by a microprocessor in the unit determines the xdentxty of 
the other data processing unit to which the shxftxng regxster 
is connected. A link between the clocks may be provxded to 
25 synchronise the stepping of the data in the two connected 

shiftinq registers. 

Each data processing unit is preferably connected by 
its communication means to only the nearest other data . 
processing units along each coordinate dxrectxon of the array 
30 in which the units are arranged. This simplifies the 

ommunication retirements for the units without res rxc xn 
the ability of each unit to communicate wxth each other apart 
from the imposition of a small time delay. 

The computer may further include broadcasting means 
35 connected to a receive-only input of each aata processing 
unit enabling messages and commands (such as start, eset 
and stop, to be sent directly to each unit. It can also be 
used to transmit large amounts of data to the data proces- 
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sing units, for example, initial synapse values for setting 
up the computer. Selection of the units capable of receiving 
certain messages from the broadcasting means may be effected 
by allocating multi-bit addresses to the units and providing 
the messages with a multi-bit address code, each unit being 
arranged to compare that code with its own address and only 
accepting messages addressed to them. The comparison of 
address codes may be restricted to say "1" bits only so that 
a message for all units would have an address code of all 
I's. 

In order that the invention may be fully understood 
and readily carried into effect it will now be described with 
reference to the accompanying drawings, of which:- 

FIGURE 1 is a diagram of a three-dimensional array of 
interconnected simulated neurons; 

FIGURE la is a simplified diagram of one such 
simulated neuron illustrating its function by analogue 
computing circuitry; 

FIGURE 2 shows an example of a digital circuit used 
to form a simulated neuron according to the invention; 

FIGURE 3 is a functional diagram of the circuit of 
Figure 2; 

FIGURE 4 is a diagram of the communications chip used 
in Figure 2; 

FIGURE 5 illustrates the format of a message from one 
neuron to another; 

FIGURE 6 is a perspective diagram of a neuron 
according to Figure 2 showing the interconnection paths to 
adjacent neurons; 

FIGURE 7 is a diagram of part of the communications 
chip of Figure 4 illustrating the mechanism of serial data 
transfer; 

FIGURE 8 is a diagram to be used explaining the 
operation of the circuit of Figure 7; and 

FIGURE 9 is a diagram of an example of the solution 
engine shown in Figure 2. 

The example of a computer to be described is designed 
specifically to simulate large numbers of neural networks in 
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parallel. The overall structure of the machine is shown in 
Figure 1 and consists of a large number (tens or hundreds) of 
individual autonomous neural networks. The networks are 
implemented in groups using digital data processing units 
5 represented by circles in Figure 1. A plurality of these 

units are interconnected to form a three-dimensional array to 
which input data is applied at one side and from which output 
data is derived from another side. The entire system is 
controlled by a host computer located at one corner of the 
10 array which issues global commands such as 'start' and 'stop- 
to the data processing units, is the means by. which 
parameters are transferred to the neural networks (e.g. 
representing the interconnections for data to be implemented 
between the networks) and performs the data gathering and 
15 error reporting functions of the system. The host computer 
may be a conventional microprocessor or a more powerful one 
such as, for example, a mini or main frame computer. 

Each data processing unit, of which a simplified 
diagram is shown in Figure 1A, may be regarded as a con- 
20 figurable neural network consisting of, for example, 256 

simulated neurons each with typically 256 inputs. The inputs 
are represented in Figure 1A by vertical lines and the 
neurons by horizontal lines. At each cross-point in the 
network there is located a synapse, a 16-bit integer shown 
25 diagrammatically as a triangle in Figure 1A. Solution of the 
network is performed by applying an input vector, of which 
each member is an 8-bit term to the inputs, forming the 
product of the input vector terms and the corresponding 
synapses is formed for each neuron and summing the products 
30 for each neuron to give a 24-bit number. This number is then 
operated upon with a non-linear function to give an 8-bit 
element of the output vector for each neuron. The non-linear 
function may be such as to give an output value tending 
towards unity as the input value increases to a maximum 
35 positivp value and towards zero as the input value falls to a 
maximum negative value, symmetrically disposed about a middle 
input value where the output value is h. One such non-linear 
function is l/U+e-*) . In this way for an N x M matrix (N 
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neurons having M inputs per neuron) an M input vector is 
transformed into an N element output vector, this process 
constituting the solution of the network. The output vector 
from one neural network forms the input vector of one or more 
networks and the process is continued until the entire system 
of networks has been solved. m the data processing unit the 
multiplications by the synapses, the additions and the non- 
linear functions are implemented digitally. 

The learning phase is executed in a similar manner, 
the precise sequence of calculation steps depending on the 
algorithm used. m all cases the learning phase results in 
the updating of the synapse values stored in each data 
processing unit in dependence upon differences between the 
actual output of the system and its desired output. 
15 The updatting of the synapse values may be based on 

the error propagation algorithm proposed by Rummelhard and 
McCelland in "Parallel Distributed Processing" published by 
MIT press 1986. The error, equal to the difference between 
the desired output of the network and the output actually 
20 produced, is propagated backwards through the neurons as 
were used in the solution process. For each neuron a local 
error is produced on the assumption that the local errors of 
the neurons connected to drive the same neuron are 
proportional to the output values of the neurons. Using the 
25 local errors the values of the synapses are updated so as to 
tend to reduce the local errors to zero using 

lj ~" Wi iL + ^ A i T j' Where W ij is the synapse 
value for the jtn input which contributed fcQ ^ 1<jcai 

^ error 4., and n is a scaling factor ( < l) . m this waV/ 
the adjustment of the synapse values serves to reduce to zero 
the global error of the whole network. 

Figure 2 shows a block diagram of one data proces- 
sing unit of the type used in Figure 1. it consists of 
standard components, a microprocessor 1, a dynamic RAM 2 of 
35 capacity from 64k x 16 bits upto 4M x 16 bits and a static 
RAM 3 of capacity 2k x 8 bits and two semi-custom gate array 
circuits, a solution engine 4 and a serial communications 
circuit 5. The components are interconnected in conventional 
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bv data address and control buses. The whole unit 
manner by data ad appr0 x lately 3" square 

: arrav cMP 4 forcing the solution engine is arranged 
The gate Qn and learning operations. The 

to perform both the *° lut al purpose 

serial communications circuit 5 acts as g ^ 
,. nut unit for communication along the three 
input/output unit .w , results of the 

c a rrav which allows the results 

dimensions of the array wnic , pstinat i 0 n unit within 

0 the time ta*en by the unit to ^ 
communication imposes no aeXay on t >£ £or the 

The microprocessor 1 acts as th _ ^ 

> ■ «, ..ibility o£ operation. Since tne 
unit to provide »«£il»ty {^ ntea by th . sol ution 

pactions ana aaa t ^ ,. ot t0 > 

15 en91nS ' as" The T»5 7742 microprocessor inaicatea in 

particuiariy fast The ^ ^ ^ 

Figure 2 includes 4k bits f-testing 
Jnel software to perform boot, strapping, a self test 

— -ition , pro. space ^ netwotk 
microprocessor 1 m which a learning phase. This 

— '-; a " i : 1 ::: f : ». ho= 9 t 

(Figure ij decided to use the 

aescribea. ObviousXy " ™ £un=tlons it roi ght be 

^coprocessor to perfo ^ ^ ^ ^ malntaln 

advantageous to J processing units. 
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Pigure 3 is a functional diagram of the data proces- 
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during the learning phase. ^ ^ ^ 
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follows . 

The microprocessor 1 instructs the solution engine 4 
to evaluate one neuron. When the multiplications and 
addition are complete and the output value obtained the 

5 microprocessor 1 receives the result, applies the non-linear 
function to it, computes the location for the output of that 
function and 'transfers the data and the address to the 
communications circuit 5. While the microprocessor is 
performing these operations the solution engine 4 has been 

10 evaluating the next neuron and the microprocessor then 

receives the result of that. The successive results derived 
by the data processing unit are transferred serially between 
the unit and the appropriate nearest neighbouring unit in the 
three-dimensional array. This process continues until the 

15 final results are transferred to the outputs of the entire 
system. When a unit receives data via the communication 
system its microprocessor causes the value to be stored in 
the appropriate region of the dynamic RAM for use in 
evaluating the neurons of that unit. It will be appreciated 

20 that the non-linear function applied to the sum produced at a 
neuron and the interconnections between data processing units 
can be chosen freely since they are defined in the software 
stored by the microprocessor. The gate array 4 forming the 
solution engine has no flexibility beyond the variation in 

25 the values of the synapses used in the multiplication 
operations and can therefore be arranged to handle its 
restricted ra'nge of operations at very high speed. 

There are many possible ways of interconnecting a 
plurality of data processing units into an array or a net. 

30 Since it is not known at the outset to which other units the 
outputs of a'unit will need to be transferred, it is 
desirable that the interconnection should allow the transfer 
of a message -from any one unit to any other unit. However, 
to provide separate connections between each pair of units, in 
35 a group would result in an enormous number of connections 

being provided. This difficulty can be overcome by arranging 
to pass messages from each unit to each of its nearest 
neighbours from which by successive transfers a message can 
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In the operation of the serial communications system, 
messages are passed by connecting the ends of the transmit- 
ting and receiving message registers in the communication 
5 circuits of the units concerned so as to form a continuous 
loop and the., contents of the registers are then swapped by 
shifting the data out of one register serially into the 
other. 

Figure 7 is a diagram showing the principle of the 
10 serial communication system where a register 30 in one unit 
is connected by conductors 32 and 33 to a register 31 in the 
other unit to form a continuous loop. A third conductor 34 
is connected between the registers to ensure the synchronism 
of the clocks stepping the bits along in the two registers. 
15 For such transfer one unit is designated the 'master- and the 
other the 'slave' so that the clock of the master controls 
the clock of the slave. This master/slave designation is 
purely arbitrary, and the unit nearer the host computer is 
designated as the master and that further away the slave. 
20 Since there are six possible directions in which data 

can be transmitted from a unit to another unit, it would be 
possible to provide six separate registers in the 
communication circuit for transfer in the six possible each 
directions. However, it is more economical to provide a 
single communication register and to use a multiplexer and a 
demultiplexer to connect the end terminals of the register 
selectively to the six possible other units. Another 
multiplexer handles the clock signal. Figure 8 shows how 
this could be done. 

Should a data processing unit be non-functional or 
busy, no message can be passed to it, but will remain in the 
transmitting communication register because an exchange of 
ready signals preliminary to the transfer will not have taken 
Place. if this occurs, the communication circuit is arranged 
to reroute the message via other units. m this way data can 
be arranged to bypass non-functional units or units which are 
busy. This recovery procedure may be defined purely in the 
software of the microprocessors and can be arranged to 
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second data control unit 46 to the microprocessor 1 (Figure 
2) for application of the non-linear function to the value 
and the derivation of the tern, of the output vector of the 
data processing unit as described above. 

The solution engine also includes an instruction 
decoder 47 which causes the engine to execute the predeter- 
mined sequence of operations outlined above for producing the 
neuron values. it also enables the input vectors and the 
original and updated synapse values to be stored in a planned 
series of addresses in the DRAM 2. 
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~i,,ralitv of separate data 
x . * colter comprising a ., multl -address 

p ocessing units each ^^^^"^...i. the data 

— «» a " a C °~ Un r a in a three-aimensional, 
processing units are comunlcatl on, means of 

possibly rectangular, array „ Beans of the 

the array. 

Plurality of separate data 
2 . ft computer comprising a a multi-address 

I ocessing units each " he communication 

«ad-vrite memory and " t h. «*»ie.tl.- — » 

- «<* uni : h :: in / at r; c stii „»«. or « 

of at least two other data p ^ ^ processing 

communication means of at 1 ^ each data 

unl t and to a data input or a respondlng to 

processing as a „ lBput and producing a 

B plurality of data Bits 

plurality of data bits as an output. 

2 wherein each processor 
3 . A computer according to claim 2 

includes a gate array configure a put nOTbe rs by 

engine capable of oa ucts to produce a 

" SP " tl ^:r::rp: n r::;mrnrrno:-li P near operation on the 

:::::: « •» — — - 

• 9 or 3 wherein each 
A computer according to claim 2 . ^ th> 

aata processing uni, : include ; is to be 

unit and determine to whicn 
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5, A computer according to any preceding claim wherein 

the communication means of each data processing unit includes 

a shifting register which is arranged to provide parallel 
data transfer to and from other components of the particular 
data processing unit, and is connected to provide serial data 
transfer to and from the shifting register of the 
communication means of the or each other data processing unit 
to which it is connected, 

6» A computer according to claim 5 wherein the shifting 

registers of the communication means of two data processing 
units which are connected together are connected in a loop so 
that while data is being transferred from one register to the 
other they are also being transferred from the other 
registers to the one. 

7. A computer according to claim 6 wherein in the 
communication means of each data processing unit there are 
provided a single shifting register and two multiplexers for 
selectively connecting the serial input and output of the 
shifting register to the outputs and inputs of the shifting 
registers of the communication means of other data processing 
units . 

8. A computer according to claim 7 wherein the 
communication means of each data processing unit further 
includes a third multiplexer for linking the clock of the 
communication means to the clock of another communication 
means so as to synchronise the stepping of the data of the 
two shifting registers connected together in a loop, 

9. A computer according to any one of claims 2 to 8 
wherein each data processing unit has six pairs of terminals 
for connecting the communication means of the particular data 
processing unit to the communication means of six other data 
processing units. 
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10 a computer according to el.i. * wherein the data 

"recessing units are arranged in a three-dimensxonal 
fecta lar array and each unit is connected ^ ;~Ts 
Zl the nearest units in the three coordinate dictions. 

„. A computer according to any one of claims 2 to 10 
further including broadcasting means connected to a receiv 
only input of each data processing unit. 

12 A computer according to claim 11 wherein each data 

ocesstn »it has allocated to it a »»ti-bit - ress and 
Includes message selection means connected to the recede 
only input for accepting only those messages from the 
^Lasting means which are addressed to the particular 
processing unit. 

13 . A computer substantially as described herein and as 
illustrated by the accompanying drawings. 
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